# DPO reinforcement learning
Bielik 1.5B V3.0 Instruct
Apache-2.0
Bielik-1.5B-v3-Instruct is a 1.6 billion parameter Polish generative text model, fine-tuned for instructions based on Bielik-1.5B-v3, developed by SpeakLeash in collaboration with ACK Cyfronet AGH.
Large Language Model
Transformers Other

B
speakleash
780
8
Calme 2.1 Qwen2.5 72b
Other
Advanced language model fine-tuned based on Qwen/Qwen2.5-72B-Instruct, excelling in natural language understanding and generation
Large Language Model
Transformers English

C
MaziyarPanahi
155
3
Orca Mini V5 8b Dpo
An 8B parameter model based on the Llama 3 architecture, trained with various DPO datasets, focused on text generation tasks
Large Language Model
Transformers English

O
pankajmathur
16
3
Llama 3 8B Instruct 64k
An 8B parameter large language model developed based on winglian/Llama-3-8b-64k-PoSE, using PoSE technology to extend context length to 64k and optimized with DPO fine-tuning
Large Language Model
Transformers English

L
MaziyarPanahi
91
12
TC Instruct DPO
Apache-2.0
Thai instruction-optimized model fine-tuned from Typhoon-7B using Direct Preference Optimization (DPO) technology
Large Language Model
Transformers Supports Multiple Languages

T
tanamettpk
28
10
Phi2 Chinese 0.2B
Apache-2.0
A 200-million-parameter Chinese causal language model based on the Phi2 architecture, supporting text generation tasks
Large Language Model
Transformers Supports Multiple Languages

P
charent
65
30
Featured Recommended AI Models